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We propose a universal approach for analysis and fast simulations of stiff stochastic bio- 
chemical kinetics networks, which rests on elimination of fast chemical species without a 
loss of information about mesoscopic, non-Poissonian fluctuations of the slow ones. Our 
approach, which is similar to the Born-Oppenheimcr approximation in quantum mechanics, 
follows from the stochastic path integral representation of the full counting statistics of reac- 
tion events (also known as the cumulant generating function). In applications with a small 
number of chemical reactions, this approach produces analytical expressions for moments 
of chemical fluxes between slow variables. This allows for a low-dimensional, interpretable 
representation of the biochemical system, that can be used for coarse-grained numerical 
simulation schemes with a small computational complexity and yet high accuracy. As an 
example, we consider a chain of biochemical reactions, derive its coarse-grained descrip- 
tion, and show that the Gillespie simulations of the original stiff system, the coarse-grained 
simulations, and the full analytical treatment are in an agreement, but the coarse-grained 
simulations are three orders of magnitude faster than the Gillespie analogue. 
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I. INTRODUCTION 



Single molecule biochemical experiments provide highly detailed knowledge about the mean 
time between successive reaction events and hence about the reaction rates. Additionally they 
deliver qualitatively new information, inaccessible to bulk experiments, by measuring other re- 
actions statistics, such as variances and autocorrelations of successive reaction times^^EEE]_ j n 
their turn, these quantities relate to structural properties of the reaction networks, uncovering such 
phenomena as internal enzyme states or multi-step nature of seemingly simple reactions, and hence 
starting a new chapter in the studies of the complex biochemistry that underlies cellular regulation, 
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signaling, and metabolism. 

However, the bridge between the experimental data and the network properties is not trivial. 
Since the class of exactly solvable biologically relevant models is limitecPEEl, exact analytical calcu- 
lations of statistical properties of reactions are impossible even for some of the simplest networks. 
Similarly, the variational approac h 1 10 ^ 11 ! and other analytical approximations are of little help when 
the experimentally observed quantities depend on features that are difficult to approximate, such as 
the tails of the reaction events distributions. Therefore, computer simulations are often the method 
of choice to explore an agreement between a presumed model and the observed experimental data. 

Unfortunately, even the simplest biochemical simulations often face serious problems, both con- 
ceptual and practical. First, the networks usually involve combinatorially many molecular species 
and elementary reaction processes: for example, a single molecular receptor with n modification 
sites can exist in 2 n states, and an even larger number of reactions connecting thenP^. Second, 
while it is widely known that some molecules occur in the cell at very low copy numbers (e.g., a 
single copy of the DNA) , which give rise to relatively large stochastic fluctuation d 13 ! 14 ! 15 ! 16 ! 17 * 1 ^ ^, 
it is less appreciated that the combinatorial complexity makes this true for almost all molecular 
species. Indeed, complex systems with a large number of molecules, like in eukaryotic cells, may 
have small abundances of typical microscopic species if the number of the species is combinatorially 
large. Third, and perhaps the most profound difficulty of the "understanding-through-simulations" 
approach, is that only very few of the kinetic parameters underlying the combinatorially complex, 
stochastic biochemical networks are experimentally observed or even observable. For example, the 
average rate of phosphorylation of a receptor on a particular residue can be measured, but it is 
hopeless to try to determine the rate for each of the individual microscopic states of the molecule 
determined by its modification on each of the other available sites. 

While some day computers may be able to tackle the formidable problem of modeling astronomi- 
cally large biochemical networks as a series of random discrete molecular reaction events (which will 
properly account for stochastic copy number fluctuations), and then performing sweeps through 
parameter spaces in search of an agreement with experiments, such powerful computers are still 
far away. More importantly, even if this computational ability were available, it would not help 
in building a comprehensible, tractable interpretation of the modeled biological processes and in 
identifying connections between microscopic features and macroscopic parameters of the networks. 

Clearly, such an interpretation can be aided by simplifying the networks through coarse-graining, 
that is, by merging or eliminating certain nodes and/or reaction processes. Ideally, as in Fig. [TJ 
one wants to substitute a whole network of elementary (that is, single-step, Poisson-distributed) 
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FIG. 1: (a) A complex network of elementary reactions connecting the input and the output nodes. Note that 
the choice of these nodes usually distinguishes different coarse-graining schemes, and it is rather arbitrary. 
In our work, the choice is determined by the adiabatic time scale separation, as described in Methods. In 
principle, such networks can be coarse-grained by multiple methods. In (b) we illustrate the decimation 
procedure, where intermediate nodes with fast dynamics get eliminated successively, resulting in complex 
reactions connecting their immediate neighbors; the statistics of these complex reactions are determined 
by the cumulant generating functions (CGF) <S(x), cf. Results. Other coarse-graining schemes are possible. 
For example, (c) nodes can be merged in hyper-nodes, again connected to each other by complex reactions. 
Combinations of the strategies are also allowed. Panels (b) and (c) resemble the decimation and the blocking 
procedures in statistical physics^, and, not coincidentally, statistical physics is the field where coarse- 
graining has had the biggest impact and is the most developed^. Both the decimation and the blocking 
are in the spirit of the real-space renormalization group on an irregular lattice, and one can also think of 
momentum-space-like approaches as a complement)^. 

biochemical reactions with a few complex reaction links connecting the species that survive the 
coarse-graining in a way that retains predictability of the system. Not incidentally, this would 
also help with each of the three major roadblocks mentioned above, by reducing the number 
of interacting elements, increasing the number of molecules in agglomerated hyperspecies, and 
combining multiple features into a much smaller number of effective, mesoscopic kinetic parameters. 

The importance of coarse-graining in biochemistry has been understood since 1913 2 2 when the 
first deterministic coarse-graining mechanism, now known as the Michaelis-Menten (MM) enzyme, 
was proposed for the following kinetic scheme: 

k-i[C] 

Here ki, k2, and fc_ i are kinetic rates, S, P, E, and C denote the substrate, the product, the 
enzyme, and the enzyme-substrate complex molecules, respectively, and [. . . ] represent the abun- 
dances. The enzyme catalyzes the S — > P transformation by merging with S to create an unstable 
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complex C, which then dissociates either back into E + S or forward into E + P, leaving E un- 
modified. If [S] 3> [E], then the enzyme cycles many times before [S] and [P] change appreciably. 
This allows to simplify the enzyme-mediated dynamics by assuming that the enzymes equilibrate 
quickly at the current substrate concentration, resulting in a coarse-grained, complex reaction with 
decimated enzyme species: 

S^P k2[S][E] (2) 

However, this simple reduction is insufficient when stochastic effects are important: each com- 
plex MM reaction consists of multiple elementary steps, thus the statistics of the number of 
MM reactions per unit time, in general, is non-Poissonian. While some relatively successful at- 
tempts have been undertaken to extend simple deterministic coarse-graining to the stochastic 
domairP^ElESESEZl^ a general set of tools for coarse-graining large biochemical networks has not 
been found yet. 

In this article, we propose a method for a systematic rigorous coarse-graining of stochastic 
biochemical networks, which can be viewed as a step towards creation of comprehensive coarse- 
grained analysis tools. We start by noting that, in addition to the conceptual problems mentioned 
above, a technical difficulty stands in the way of stochastic simulations methods in systems biology: 
molecular species and reactions have very different dynamical time scales, which makes biochemical 
networks stiff and difficult to simulate. Here we propose to use this property of separation of time 
scales to our advantage. 

The idea is not new, and multiple related approaches have been proposed in the literature, 
differing from each other mainly in the definition of fast and slow variables. A common practice is to 
use reaction rates to identify fast and slow reaction d 23 l 24 * 25 [ However, if two species of very different 
typical abundances are coupled by one reaction, then a relatively small change in the concentration 
of the high abundance species can have a dramatic effect on that of the low abundance one. This 
notion of species-based rather than reaction-based adiabaticity has been used in the original MM 
derivation, and it is also at the heart of our arguments. 

Our method builds upon the stochastic path integral technique from mesoscopic 
physics^ 8 * 29 * 30 * 31 !, providing three major improvements that make the approach more applicable 
to biological networks. First, we extend the method, initially developed for large copy number 
species, to deal with simple discrete degrees of freedom, such as a single MM enzyme or a single 
gene. Second, we explain how to apply the technique to a network of multiple reactions, thereby 
reducing the entire network to a single complex reaction step. Finally, we show how the proce- 
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dure can be turned into an efficient algorithm for simulations of coarse-grained networks, while 
preserving important statistical characteristics of the original dynamics. The algorithm is akin to 
the LangevirJ321 Q r r-leapingSSESl schemes, widely used in biochemical simulations, but it allows to 
simulate an entire complex reaction in a single step. We believe that this development of a fast, 
yet precise simulation algorithm is the most important practical contribution of this manuscript. 

For pedagogical reasons, we develop the method using a model system that is simple enough for 
a detailed analysis, yet is complex enough to support our goals. A generalization to more complex 
systems is suggested in the Discussion. 

A. The model 

Consider the system in Fig. [2j an enzyme is attached to a membrane in a cell. Sb substrate 
molecules are distributed over the bulk cell volume. Each molecule can either be adsorbed by the 
membrane, forming the species Sm, ° r dissociate from it. Enzyme-substrate interactions are only 
possible in the adsorbed state. One can easily recognize this as an extremely simplified model of 
receptor mediated signaling, such as in visionP^ES or immune signaling. 

As usual, the enzyme-substrate complex C can split either into E + S or into E + P. Let's 
suppose that the latter reaction is observable; for example, a GFP-tagged enzyme sparks each time 
a product molecule is createcP. We further suppose the reaction C — > E + P is not reversible (that 
is, the product leaves the membrane or the reaction requires energy and is far from equilibrium). 

The full set of elementary reactions is 

1. adsorption of the bulk substrate onto the membrane, Sb — > Sm, with rate qoSb; 

2. reemission of the substrate back into the bulk, Sm — * Sb, with rate qSyi] 

3. Michaelis-Menten conversion of Sm m t° P, consisting of 

(a) substrate-enzyme complex formation, Sm + E — > C, with rate A^Sm; 

(b) complex backward decay, C — > Sm + E, with rate k-%; 

(c) product emission C — > E + P, with rate &2- 

Note that here and in the rest of the article, we drop the [. . . ] notation for denoting abundances 
and don't make a distinction between a species name and the number of its molecules. 

In this setup, only emission of the product is directly observable. Our goal is to coarse-grain 
the above system of five reaction processes into a single complex reaction Sb — ► P, as in Fig. Kmc). 
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FIG. 2: A single enzyme on a membrane, interacting with substrate molecules. Green, blue, and grey 
circles are bulk substrate, membrane substrate, and product molecules, respectively. Arrows represent 
possible reactions: (1,2) adsorption and dissociation of S by/from the membrane (orange); (3) multi-step 
MM conversion S — > P (red) . 

That is, we want to eliminate all intermediate species and reaction processes, while preserving their 
effects on the statistical properties of the complex reaction Sb — ► P on time scales appropriate for 
its dynamics. 

This set of reactions has another interpretation. Consider an MM enzyme in the bulk together 
with the substrate. When the substrate concentration is small, only few of its molecules are 
sufficiently close to the enzyme to interact with it. In this context, one can approximate the 
full reaction-diffusion setup by a system having an inner (reactive) and an outer (non-reactive) 
regions surrounding the enzyme. Diffusion takes substrate molecules between the regions with 
(almost) Poisson statistics of transitions, with the rate parameters depending on the volume of 
the regions and the diffusion coefficient. The particles in the inner region can interact with the 
MM enzyme, completing the mapping between the reaction-diffusion system and the multi-state 
well-mixed kinetic process described above. 

II. RESULTS 

There are three distinct effective time scales in the system in Fig. [2j One is the time scale tb 
of the variation of the bulk substrate abundance. We assume that Sb is much larger than Sm- 
Therefore, this time scale is the slowest, and we will be interested in studying the response of 



7 



Initial set of reactions 
qoS B 



S 



k,S M 

S M +E^ C 

k-i 



k2 



P+E 



<b) 



Step 1 



s M p 



(c) 



Step 2 



S(S B ,Xl 
S B » P 



FIG. 3: Coarse-graining of the model. In (a) we show the original set of reactions describing the system 
in Fig. [2j Panel (b) represents the set of reactions after the first coarse-graining step. Note that here the 
three stages of the MM enzyme have been replaced by a single complex reaction. Further, all the remaining 
(simple or complex) reactions are now represented by their slowly varying CGFs. Panel (c) shows the final 
reaction that describes the system at time scales much larger than the characteristic time of evolution of 
5m, the number of the substrate molecules on the membrane. The wavy line corresponds to a spark of the 
tracer molecule^, which counts the number of Sb — * P transformations. 



the system to the bulk substrate abundance Sb on this scale. A faster time scale is given by the 
dynamics of the molecules on the membrane, tm- Finally, at the other extreme, the fastest time 
scale, te, is set by single reaction events, that is, the characteristic time between two successive 
product releases by the enzyme. Overall, te <C tm <C tq. 

We emphasize again that all species in the problem are connected by reactions that happen with 
approximately the same rates, and the separation of the time scales is a direct result of the particle 
abundances, rather than the conversion speeds: it takes only a few reaction events to change a 
low-abundance species drastically, and a lot longer to do the same to a high-abundance one. This 
is the main reason why we believe that this illustrative model will shed light on coarse-graining of 
a wide class of networks. 

The hierarchy of times allows us to coarse-grain the system in two steps, as in Fig. |3| First, 
we remove the variable with the fastest dynamics, that is, the binary occupancy substrate-enzyme 
complex C. This replaces the three reactions of the MM mechanism with a single reaction Sm ~~ > P 
(Fig. |3|b)). Additionally, we represent the other reactions in the system in a form more suitable 
for the subsequent developments. In the second step, we eliminate the membrane-bound substrate 
variable, which evolves on the scale tm- This results in the characterization of the average flux 
and its fluctuations for Sb — > P transformation, treating Sb as a time-dependent input parameter, 
cf. Fig.^c). 
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A. Preliminaries 

Let 5Q[_i stand for the number of reaction events for the reaction type [i (in our example, 
/i = 1,2,3 corresponds to adsorption, detachment, and the whole MM reaction, respectively). 
Then P{5Qfj\T) is the probability distribution of the number of events of type [i during a temporal 
window of length T. Instead of considering these distributions directly, our derivation for the 
coarse-graining relies the corresponding moment generation functions (MGFs)^ 

oo 

ZpfcT) = e s ^ = P{5Q^\T)e iS ^. (3) 

<5Q M =0 

where Sa{x) 1S the cumulant generating function (CGF). The moments and the (complex) cumu- 
lants of the distribution P(SQ^\T) can be calculated by differentiating the MGF and the CGF, 
respectively 

ga 



d x a 

0" 



MX,T) (4) 

s,*(x,n (5) 



g x a 

where a stands for the order of the moment (cumulant). In particular, the average flux for the 
reaction \i is c^i = m^i, and the corresponding variance is er 2 = = "V,2 — m f l ,i- 

B. Step 1: The generating function representation 

This step can be viewed as a generalization of the standard r-leaping approximation®'^, which 
prescribes to simulate elementary, exponentially distributed reaction events, for example attach- 
ment/detachment reactions in Fig. [3^a), by choosing a time step such that the number of the 
reactions in it is much larger than unity, yet the reaction rates (determined by the dynamics of 
the slower variables in the problem) can be considered stationary. In a T-leaping scheme, one then 
approximates the distribution of the number of reaction events by a Poisson distribution. 

Unfortunately, not all biochemical processes can be treated in such a simple manner. For exam- 
ple, due to the single-copy nature of the MM enzyme in our system, Fig. [3]^a), the instantaneous 
rate of the product creation is a fast varying function of time, switching between zero and &?2 
every time binding/unbinding events happen. Therefore, one cannot use r-leaping or Langevin 
scheme^ 32 ! 33 ! 34 !, or treat the product creation as a homogeneous Poisson process. We would like to 
avoid being forced into the Gillespie^ or the StochSimPSl analysis schemes. Since either of these 
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schemes is based on Monte-Carlo simulations of every individual reaction event, the estimation of 
parameters of interest may become excessively slow in large systems. 

As an alternative, we will identify good approximations for the distribution of the number of 
reactions in a fixed time interval, which is no longer a Poisson distribution, by characterizing its 
CGF (see Methods: Moment generating functions for elementary chemical reactions). To this end, 
we propose to eliminate the binary substrate-enzyme complex variable and reduce the MM reaction 
triplet to a single reaction, whose dynamics can be considered stationary over times much longer 
than a single reaction event. This completes Step 1 of the coarse-graining in which each reaction, 
or a small complex of reactions, is subsumed by a CGF of the distribution of the number 
of events, which can be considered stationary for extended times. Importantly, in this Step, we 
removed the only species in the problem that exists, at most, in a single copy and hence is stiff. 
This dramatically simplifies simulations and analysis of the system. 

The details of this are given in Methods: Coarse- graining the Michaelis-Menten reaction. In 



particular, in Eq. (32) we derive 1S3, the CGF for the entire complex Michaelis-Menten reaction, 
eliminating the intermediate substrate-enzyme complex C. The expression is valid over times much 
larger than te, but smaller than tm, so that many enzyme turnovers happen, but the effect on the 
abundance of the membrane-bound substrate is still relatively small. 

In the MM mechanism, the backward reaction is often a simple dissociation, whereas the forward 
one requires crossing an energy barrier and is exponentially suppressed. As a result, one often has 
k-i S> &2, which can make the MM reaction doubly stiff, requiring multiple binding events (and 
with them the instantaneous rate changes) for each released product. Therefore, replacing the 
entire MM mechanism with a single complex reaction step has a dramatic effect on analysis of 
the reaction, and specifically on the simulation efficiency, which can now be performed using the 
Langevin-like quasi-stationary approximation. 



To illustrate this, using Eq. (32 ), we write the first few cumulants of the number of MM product 
releases per time 5t: 

M3 = klk ^ M 5t, K = k\Sy[ + &2 + fc— 1, (6) 
2 jp rp _ 2kxk 2 Su 



ai = » 3 F, F=\l- y 2 ) , (7) 

( 6Q(K-2Q) \ 

C 3 ,3 = M3 ( 1 ^ ) > Q = ^/fc, (8) 

/ 2Q{1K 2 - 36KQ + 60Q 2 \ 

C 3 ,4 = M3 [1 ^3 j ■ (9) 

The coefficient F in the expression for 03 is called the Fano factor (see below). To the extent that 
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FIG. 4: Distribution of the number of Michaelis-Menten reactions over a time 5t = 35 with 5m = 140, 
k\ = 0.01, fc_i = 1, and ki — 1 vs. the Poisson distribution with the same average number of reactions. 
The distribution for the MM process is obtained using the Gram-Charlier expansion with the four known 
cumulants, see Methods. 



a\ 7^ /is , this complex reaction is non-Poisson, as illustrated in Fig. |4 

Knowing cumulants of the reaction events distribution allows for a simple numerical simulation 
procedure 



SQa(t) = V3(t,5t), 
S M {t + St) = S M {t) - SQ 3 (t) + J(t)6t, 
P(t + St) = P(t) + SQ 3 (t), 



(10) 

(11) 

(12) 



where rjsit) is a random variable with the cumulants given by Eqs. (6|9), and J(t) represents cur- 
rents exogenous to the MM reaction, such as changes in Sm due to membrane binding/unbinding. 
Notice that, in this step, we are now treating the single-particle-mediated reaction in a quasi- 
stationary, r-leaping or Langevin-like way by drawing a (random) number of complex reaction 
events over a time 5t directly, assuming that all parameters defining the reaction are constants 
over this time. The price for the coarse graining is that instead of characterizing any reaction by a 
single rate that defines a Poisson distribution of reaction events, one is forced to use a distribution 
with the prescribed sequence of moments for the Monte-Carlo simulations. 

In principle, generation of such random variables is a difficult and an ill-posed task since the 
moments do not define the distribution uniquely, and two distributions with matched moments can 
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be arbitrarily different from each other. Additionally, once we allow for a nonzero third or fourth 
cumulant, the remaining higher order cumulants cannot be zero anymore^, and the generated 
random variable will depend on the assumptions made about them. Fortunately, in our case, the 
situation is simplified because all 03^ ~ 5t. Thus the fc'th cumulant will have a progressively smaller 
effect, ~ (<5i) 1//fc , on a number drawn from the distribution, and our random variables are almost 
Gaussian. Then the Gram-Charlier series expansion®! aided either by the importance or rejection 
sampling^ reduces the simulation scheme, Eqs. 
a Gaussian noise and a small penalty, as described in Methods: Simulations with near-Gaussian 
distributions; see also Fig. [8] in Methods for illustration of the precision provided by these tools. 
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C. Step 2: Coarse-graining membrane reactions 

For Step 2 of our coarse-graining approach, we are given the CGFs 5 M , /x = 1, 2, 3, of the slowly 
varying reactions. Using the stochastic path integral technique, we then express the CGF of the 
entire coarse-grained reaction 5b — > P in Fig. (3^c) in terms of the component CGFs, and then 
simplify the expression to account for the time scale separation between the dynamics of 5b and 



5m- This is presented in detail in Methods: Coarse-graining all membrane reactions, cf. Eq. (46). 
This formally completes the coarse-graining. That is, we find the CGF of the 5b — ► P particle flux 
for times T < tb, much longer than te and tm, the other time scales in the problem. 

The resulting CGF depends on microscopic reaction rates, which can depend on slow parameters, 
such as 5b- The full expression for CGF is cumbersome and non- illuminating. Fortunately, we 
only want to calculate the first few cumulants of the reaction events distribution, and these are 
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obtained by differentiation the CGF as in Eq. (|5j. This gives 
„ 1 



ci 



C2 



2fc 



/ci(A;o + ^2) + q{k 2 + k- 



'fc?(*0 " ^ 2 ) 2 + 2fcig(fc + fc 2 )(A; 2 + + g 2 (fc 2 + fc_!) 2 
Fci, 
F = 1 



\kok 2 + fci(fco + k 2 )k-\ + qk-i(k 2 + fc-i)) 



fc?(*0 - ^) 2 + 2A;ig(A;o + A; 2 )(fc 2 + fc_i) + g 2 (& 2 + 



+ 



+ 



C.3 



-T- 



^(fco - ^) 2 + 2feig(fe + A; 2 )(fc 2 + /c_i) + g 2 (/c 2 + &-1) 2 
^ {k 5 ^ - p 10 + k P 7 [5kjk 2 + q (llfti + 6g) a] 



(13) 
(14) 

(15) 



p(—Kkl + p : 

-K 2 k%kfp 2 [5/c 2 fc| + 6/c 2 - 2q) qs + 24^ 2 s 2 ] 

+2K 2 k klp 3 [5kfk% + k 2 q (Uk\ - 9k iq - 6q 2 ) s + 6q 2 (5ki + 3q) s 2 ] 

-2nk k lP A [5kfk% + 19k$k$qs + 9k\k 2 q 2 s 2 + 6k 2 q 4 s 2 + 3k iq 3 s {-2k\ + 8k 2 s + s 2 )] } (16) 



where s = ki(Su) + k 2 + (5m) = 2 i^{^o&i - k\k 2 - k 2 q - k-\q + [4kik q(k 2 + A;_i) + 
{k\k 2 — k\kQ + k 2 q + k^iq) 2 ] ^ 2 j is the average number of membrane-bound substrates, ko = qoS&, 
k = kokik 2 , p = k\k 2 + qs, and, finally, T is the time step over which Sb changes by a relatively 
small amount, but many membrane reactions happen. 

By analogy to the MM reaction, Eqs. ( Io][l2" ), the results from Step 2 allow for simulations of 
the whole reaction scheme in one Langevin-like step: 



SQ(t) 
S B (t + T) 
P(t + T) 



v(t,T), 

5 B (t) - SQ(t) + J{t)T, 
P(t)+5Q(t), 



(17) 
(18) 
(19) 



where rj is a random variable with the cumulants as in Eqs. (13 16), to be generated as in Methods: 



Simulations with near-Gaussian distributions, and J{t) is an external current, such as production 
or decay of the bulk substrate in other cellular processes. 



D. Fano factor in a single molecule experiment 

In analyses of single molecule experiments, one often calculates the ratio of the variance of the 
reaction events distribution to its mean, called the Fano factor^ES 

F = ^. (20) 
ci 



13 




FIG. 5: Comparison of the analytically calculated Fano factor for the Sb — ► P reaction, Eq. ( 15 1, to direct 
Monte Carlo simulations with the Gillespie algorithrrP^. Here we use q = 0.02, k\ = 0.05, ^2 = 1, evolution 
time T — 10000 (in arbitrary units). Each numerical data point was obtained by averaging 10000 simulation 
runs. 

The Fano factor is zero for deterministic systems and one for a totally random process described 
by a Poisson number of reactions. As a result, the Fano factor provides a natural quantification 
of the importance of the stochastic effects in the studied process. In vivo, it can be measured, 
for example, by tagging the enzyme with a fluorescent label that emits light every time a product 
molecule is releasecP. 

Traditionally, to compare experimental data to a mathematical model, one would simulate 
the model using the Gillespie kinetic Monte Carlo algorithm^!, which is a slow and laborious 
process that takes a long time to converge to the necessary accuracy (see below). In contrast, our 
coarse-graining approximations yields an analytic expression for the Fano factor of the S*b — > P 



transformation via Eq. ( 15 ) . This illustrates a first practical utility of our coarse-graining approach. 
In Fig. [5] we compare this analytical expression, derived under the aforementioned quasi-stationary 
assumption, with stochastic simulations for the full set of elementary reactions in Fig. [3^a). The 
results are in an excellent agreement, illustrating the power of the analytical approach. 

Note that the Fano factor is generally different from unity, indicating a non-Poissonian behavior 
of the complex reaction. The backwards decay of C, parameterized by k-\, adds extra randomiza- 
tion and thus larger values of k—i increases the Fano factor F. At another extreme, when k-i = 0, 
the Fano factor may be as small as 1/2, indicating that then the entire Sb —> P chain can be de- 
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scribed by the sum of two Poisson events with similar rates, which halves the Fano factor. Finally, 
when q = 0, i.e., the substrates are removed from the membrane only via conversion to products, 
the Fano factor F is one. This is because here the only stochasticity in the problem arises from 
Poisson membrane binding: on long time-scales, all bound substrates will eventually get converted 
to the outgoing flux. 



E. Computational complexity of coarse-grained simulations 

As we alluded to before, in addition to analytical results, such as the expression for the Fano 
factor, we expect the coarse-graining approach to be particularly useful for stochastic simulations in 
systems biology. This is due to an essential speedup provided by the method over traditional simu- 
lation techniques, such as, in particular, the Gillespie algorithm^, to which most other approaches 
are compared too. 

Indeed, for the model analyzed in this work, the computational complexity of a single Gillespie 
simulation run is O (^M^j, where M = 5 is the number of reactions in the system, and T is the 
duration of the simulated dynamics. In contrast, the complexity of the coarse-grained approach 
scales as O 

since we have removed the internal species and simulate the dynamics 
in steps of ~ T, instead of steps ~ te- However, since the coarse-grained approach requires 
generation of complicated random numbers, the actual reduction in the complexity is unclear. 
More importantly, the Gillespie algorithm is (statistically) exact, while our analysis relies on quasi- 
stationary assumptions. Therefore, to gauge the practical utility of our approach in reducing the 
simulation time while retaining a high accuracy, we benchmark it against the Gillespie algorithm. 
We do this for a single MM enzyme first, and then progress to the full five reaction model of the 
enzyme on a membrane. Details of the computer system used for the benchmarking can be found 
in Methods: Simulations details. 

The Michaelis-Menten model: We consider a MM enzyme with Sm = 140 = const, k% = 0.01, 
= 2.0, tt2 = 1.0. We analyze the number of product molecules produced by this enzyme 
over time 5t = 35, with the enzyme initially in the (stochastic) steady state. To strain both 
methods, we require a very high simulation accuracy, namely convergence of the fourth moment 
of the product flux distribution to two significant digits. For both methods, this means over 10 
millions realizations of the same evolution. 

In Tbl. [I] we report the results of our simulations. We see that the analytical coarse-grained 
results differ from the exact Gillespie simulations by, at most, two per cent, which is an expected 
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Cumulants 


Gillespie 


Coarse-grained 


Analytical prediction 


Cl 


11.24 ± 0.01 


11.14 ± 0.01 


11.14 


C2/C1 


0.843 ± 0.001 


0.855 ± 0.001 


0.855 


C3/C1 


0.613 ± 0.004 


0.628 ± 0.004 


0.628 


c 4 /ci 


0.32 ± 0.02 


0.32 ± 0.02 


0.319 


time 


8 min 45 s 


12 s 


N/A 



TABLE I: Comparison of the Gillespie and the coarse-grained simulation algorithms. The numbers are 
reported for 12 million realizations of the same evolution for each of the methods. To highlight deviations 
from the Poisson and the Gaussian statistics, we provide ratios of the higher order cumulants to the mean 
of the product flux distribution. In the last column, we report analytical predictions, Eqs. 
from the quasi-steady state approximation to the CGF. 

deviation given the quality of the steady-state approximation. Further, the Langevin-like coarse- 
grained simulations, which accounted for the first four cumulants of the reaction events distribution, 
as in Methods: Simulations with near-Gaussian distributions, produce results nearly indistinguish- 
able from the analytical expressions, and, again, at most two per cent different from the Gillespie 
runs. Yet coarse-grained simulations require only l/40th the time of their Gillespie analogue since 
the time step is large, 5t = 35. 

It is hard to imagine a practical situation in modern molecular biology where the kinetic pa- 
rameters are known well enough so that the few per cent discrepancy between the full and the 
coarse-grained simulations matters. Yet the reduction of the simulation time by the factor of over 
40 is certainly a tangible improvement. 

The Michaelis-Menten enzyme on a membrane: As the next step, we compare the algorithms 
when the MM enzyme is embedded in the membrane, and random substrate-membrane bind- 
ing/unbinding events happen in addition to the MM product production [i.e., the coarse-graining 
is stopped after Step 1, Fig. (3^b)]. We use parameters k\ = 0.02, A;_i = 2, hi = 1, q = 0.01, and 
QqSb = 1-5. Total time of the evolution is T = 1000, and the initial number of the substrates on 
the membrane is 5m = 0) = 120. Finally, the relaxation time of a typical fluctuation of Sm can 
be estimated as tm ~ l/[q + (S/cmm/^Sm)] ~ wnere ^mm is the rate of the Michaelis-Menten 
reaction for a given Sm- 

On time scales of ~ 5t, the binding/ unbinding events are Poisson distributed and can be simu- 
lated by the standard r- leaping techniques 1 ^ . However, for consistency, we simulate them similarly 
to the MM-reaction, approximating the Poisson distribution by its Gram-Charlier-series. 

Since, in this setup, many events are futile membrane bindings-unbinding, the Gillespie simula- 



6]|9l, obtained 
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cumulant 


Gillespie 


Coarse-grained (stepl) Coarse-grained (step 2) 


Analytical prediction 


Cl 


418.7 ± 0.1 


420.0 ± 0.1 


418.9 ± 0.1 


418.9 


ca/ci 


0.771 ± 0.001 


0.764 ± 0.002 


0.768 ± 0.001 


0.767 


C3/C1 


0.50 ± 0.03 


0.46 ± 0.08 


0.48 ± 0.03 


0.472 


time 


lh 14min 


lmin 17s 


Is 


N/A 



TABLE II: Comparison of cumulants of the product flux for the full system membrane calculated using the 
Gillespie simulations, the coarse-grained simulations (with St — 20), the fully coarse-grained simulations 
(St = 1000), and the analytical predictions. The data were averaged over 10 6 realizations, sufficient to 
estimate the third cumulant to two significant digits. 

tions become quite time consuming, and we only achieve convergence of the first three cumulants 
in a reasonable time, with the third cumulant known to about two significant digits. For the 
coarse-grained runs, we choose the time step 5t = 20 <C tm, and we model all three coarse-grained 
reactions preserving their first three cumulants only. In this example, our coarse-grained approach 
speeds simulations 60-fold, yet it still provides accurate results for the first three cumulants of the 
distribution, see Tbl. [TTJ 

The full Sb — ► P conversion: Finally, we perform similar benchmarking for the Gillespie simula- 
tions and the coarse-grained simulations of the fully coarse-grained system, represented as a single 
complex reaction S-q — ► P. The third column in Tbl. [TT] presents the data for this coarse-graining 
level. Note that representing all five reactions as a single one results in a dramatic speedup of 
about 4000. This number relates to the ratio of the slow and the fast time scales in the problem, 
but also to the fact that futile bindings-unbindings are leaped over in the coarse-grained scheme. 

F. Generalizations to a network of reactions 

As discussed in detail in the original literature (the best pedagogical exposition is in Ref. I29j) . 
in the stochastic path integral formalism, a network of M reactions with N chemical species 
(cf. Fig. [6]) is generally described by 2MN ordinary differential equations specifying the classical 
(saddle point) solution of the corresponding path integral. Methods: Coarse- graining all membrane 
reactions provides a particular example of this technique, and we refer the interested reader the 
original work®. Here, we build on the result!®' and focus on developing a relatively simple, yet 
general coarse-graining procedure for more complex reaction networks. 

At intermediate time scales, 5t, many fast reactions connecting various slow variables can be con- 
sidered statistically independent. Therefore, in the path integral, every separate chain of reactions 
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(b) 



H 



12 





H 
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FIG. 6: Schematic coarse-graining of a network of reactions, (a) This network has M — 10 reactions (red 
arrows) and N = 8 species, of which three are slow (large circles), and five are fast (small circles), (b) 
Dynamics of each fast node can be integrated out, leaving effective, coarse-grained, pairwise fluxes among 
the slow nodes. The fluxes along entire pathways connecting the slow pairs (blue arrows) are labeled by the 
corresponding effective Hamiltonians H^. Note that, for reversible pathways (i/12 in our example), the flux 
may be positive or negative (two-sided arrow), and it is strictly non-negative for the irreversible pathways 
(one-sided arrows). 

that connects two slow variables simply adds a separate contribution to the effective Hamiltonian. 

Namely, let's enumerate slow chemical species by fi, u, Chains of fast reactions connecting them 

can be marked by pairs of indexes, e.g., fiv (cf. Figj6J). An entire such chain will contribute a single 
effective Hamiltonian term, H^ U ({N}, {x}, {xc})> to the full CGF of the slow fluxes, where {N}, 
{x}, an d {xc} are the set 01 the slow species and the conjugate, counting variables. If necessary, 
the geometric correction to the CGF, 5g e om({iV}, {x}> {xc})i can also be written out. Overall, 



S({XC},T) = *£S^ m ({N(t)},{ X (t)},{xc},T) 

£ ixA + E HvWMh {X(t)h ixc}) 



dt 



\1<V 



This expression provides for the following coarse-graining procedure. First, one finds a time 
scale St, small enough for the slow species to be considered as almost static, and yet fast enough for 
the fast ones to equilibrate. If the fast species consist only of a few degrees of freedom, like in the 
case of a single enzyme, one can derive the CGF of the transformations mediated by these species 
either by using techniques presented in this article (cf. Methods: Coarse-graining the Michaelis 
Menten reaction), or discussed previously^SEZB3]_ jf ms tead the fast species are mesoscopic, one 
can use the stochastic path integral technique to derive the CGF by analogy with Step 2 of this 



18 



article. 

At the next step, these expressions for the CGFs of the fast species are incorporated into the 
stochastic path integral over the abundances of the slow variables. For this, one writes down the 



the full effective Hamiltonian, Eq. (21 ), assumes adiabatic evolution, and solves the ensuing saddle 
point equations. The extremum of the effective Hamiltonian determines the cumulant generating 
function. For hierarchies of time scales, this reduction procedure is repeated at every level of the 
hierarchy. 



III. DISCUSSION 



As biology continues to undergo the transformation from a qualitative, descriptive science to 
a quantitative one, it is expected that more and more rigorous analysis techniques developed in 
physics, chemistry, mathematics, and engineering will find suitable applications in the biological 
domain. This article represents one such example, where adiabatic approach, paired with the 
stochastic path integral formalism of mesoscopic statistical physics, allows one to coarse-grain 
stochastic biochemical kinetics systems. 

For stiff systems with a pronounced separation of time scales, our technique eliminates relatively 
fast variables. It reduces stochastic networks to only the relatively slow species, coupled by complex 
interactions that accounts for the decimated nodes. The simplified system is smaller, non-stiff, and 
hence easier to analyze and simulate, resulting, in particular, in orders of magnitude improvement 
in the computational complexity of the simulations. Thus we believe that the approach has a 
potential to revolutionize the field of simulations in systems biology, at least for systems with the 
separation of time scales. 

Fortunately, such separation occurs more prominently in Nature than one would intuitively 
suspect. Consider for example, the system given in Fig.[7J briefly mentioned in the Introduction. A 
molecule must be modified on n sites in an arbitrary order to move from the inactive (0,0,..., 0) to 
the active (1,1,..., 1) state. The kinetic diagram for this system is an n-dimensional hypercube, 
and the number of states of the molecule with m modified sites is ("j. Therefore, if the total 
number of molecules is N, then a typical m times modified state will have N m = N/ ( n ) molecules 
in it. This number may be quite small, ensuring the need for a full stochastic analysis. More 
importantly, it is quite different from either JV m _i or iV m+ i, e.g., N m /N m+ i = (m + l)/(n — m). 
As we discussed at length above, different occupancies result in the separation of time scales, and, 
on practice, the adiabatic approximation works quite well when this separation is a factor of only 
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(1,1, ..,1) 





1 


A 




1 




^/(0,1,...,0) 


V 


(1,0, 


...,o) 



(0,0,..., 0) 



FIG. 7: A molecule must be modified on n sites (here n = 3) in an arbitrary order to get activated. and 
1 indicate a non-modified/modified site, respectively. The number of states with m < n modified sites is 
quite different for different m's, which allows for a separation of time scales, as explained in the text. 



a few. 

In addition to the analysis and simulations, our adiabatic path integral-based coarse-graining 
scheme simplifies interpretation and understanding. For example, in certain cases, the Fano factor 



of the complex Sb —> P reaction, Eq. (15), approaches unity, suggesting a simplified, yet rigorous, 
interpretation with the entire reaction replaced by a simple Poisson step. Hence the list of relevant, 
important parameters may be smaller than suggested by the ab initio description of the system, 
aiding the understanding of the involved processes and decreasing the effective number of bio- 
chemical parameters that must be measured experimentally. Recent theoretical analysis suggests 
that this may be a universal property of biochemical network d 44 * 45 !, with larger networks having 
proportionally fewer relevant parameters. Thus one may hope that the rigorous identification of 
the relevant degrees of freedom presented here will become even more powerful as larger, more 
realistic systems are considered. 

We demonstrated the strength of our coarse-graining approach in analytical calculations of the 
Fano factor for the model system (relevant for single molecule experiments), and in numerical 
simulations, where the method substantially decreased the computational complexity. While im- 
pressive, this is still far from being able to coarse-grain large, cellular scale reaction networks. 
However, we believe that some important properties of our approach suggest that it may serve as 
an excellent starting point. Namely, 

• We reduce a system of stochastic differential equations to a similar number of deterministic 
ones, which is a substantial simplification (see Methods). 
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• Our method can operate with arbitrarily long series of moments of the whole probability 
distribution of reaction events; i.e., it keeps track of mesoscopic fluctuations and even of rare 
events®!. 

• The technique is very suitable for stiff systems, allowing to reduce the complexity by means 
of standard adiabatic approximations, well developed in classical and quantum physics. 

• The stochastic path integral approach can deal with species that have copy numbers of 
order unity, which are ubiquitous in biological systems. This is not true for many other 
coarse-graining techniques. 

• Finally, unlike many previous approaches, the stochastic path integral is rigorous, can be 
justified mathematically, and allows for controlled approximations. 

In the forthcoming publications, we expect to show how these advantageous properties of the 
adiabatic path integral technique allow to coarse-grain many standard small and medium-sized 
biochemical networks. 



If during a time interval 5t the rate of an elementary chemical reaction is (almost) constant, then 
all reaction events are independent, and their number can be approximated as a Poisson variable. 
In its turn, the CGF of a Poisson variable is 



IV. METHODS 



A. Moments generating functions for elementary chemical reactions 



S( X ) = Ke lx ~ l)St, 



(22) 



where k is the Poisson rate. 



In our case, Fig. [2] two reactions satisfy these constraints for te <C St -C tm- substrate binding 
and unbinding to/from the membrane. Therefore, for these reactions we have: 



qoS B (t)(e^ - l)St 



(23) 



qS M (t)(e^ - l)5t. 



(24) 
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B. Coarse-graining the Michaelis-Menten reaction 

Consider the Sm — ► P reaction, described mathematically as in Eq. ([!]): 

S M + E ^C^E + P. 

k-i 



(25) 



The probabilities of transitions between bound (Pb) an d unbound (P u ) states of the enzyme are 
given by a simple two state Markov process 



(26) 



d 


Pu 




kiSyi —k^i — k 2 




Pu 


dt 


Pb 




-hSu k-i + k 2 




Pb 



where P u + Pb = 1. 

Lets introduce the MGF for the number of Sm P transitions 

oo 

=•53 (x) 



Z 3 ( X ,St) = & 



J2P(^Q3 = n\dt)e 



in X 



(27) 



n=0 



Here 5Q^ stands for a charge transferred over time 5t in a reaction fj,, and \i = 3 is the MM 
reaction in toy model, Fig. [2J Using Eqs. (26, 27), one can sIkW 3 - ^ 31 * 43 !, that 2%(x, St) satisfies a 
Schrodinger-like equation with a x-dependent Hamiltonian, leading to a formal solution 



-HMM(X,t)St 



p(*o), 



where 1 



z 3 ( x ,st) = r 

[1, 1) is the unit vector, p(to) is the probability vector of initial enzyme states, and 

A;iiV s - k 2 e ix 

-hN s k-i + k 2 



(28) 



Hmm(x) 



(29) 



The Hamiltonian, analogous to Eq. (29), can be derived for a very wide class of kinetic 



schemes^ 26 ! 43 ! 46 * 4 ^ , allowing for a relatively straightforward extension of our methods. 



The solution, Eq. (28), can be simplified considerably if the reaction is considered in a quasi- 
steady state approximation, that is P u is equilibrated at a current value of the other parameters. 
This means that the time on which the reaction is being studied, 5t ~ tm, is much larger than a 



characteristic time of a single enzyme turnover, te, so we can consider 5t — > 00 in Eq. (28). Then 



only the eigenvalue Aq(x) of the Hamiltonian Hmm(x) with the smallest real part is relevant, and 



-\ ( X )St 



(30) 



It is possible to incorporate a slow time dependence of the parameters into this answer. By 
analogy with the quantum mechanical Berry phase^ 31 * 47 !, the lowest order non-adiabatic correction 
can be expressed as a geometric phase 



J c A.dk-/#Ao(jc,t) 



(31) 
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where A = (uo(x)\^k u o(x)) ; k is the vector in the parameter space, which draws a contour c 
during the parameter evolution, and («o(x)l anci |^o(x)) are the left an d the right eigenvectors of 



Hmm(Xi t) corresponding to the instantaneous eigenvalue Ao(Xj The first term in Eq. (31 ) is the 
geometric phase, which is responsible for various ratchet-like fluxe d 27 ' 48 ' 49 '. 

After elimination of the fast degrees of freedom, the geometric phase gives rise to magnetic field- 
like corrections to the evolution of the slow variables. However, since these corrections depend on 
time derivatives of the slow variables, they usually are small and can be disregarded, unless they 
break some important symmetry, such as the detailed balance^ESl, or the leading non-geometric 
term is zero. In our model, the geometric effects are negligible when compared to the dominant 
contribution when te/tm ~ I/Smj an d we deemphasize them in most derivations. However, we 
keep the geometric terms in several formal expressions for completeness, and the reader should be 
able to track its effects if desired. 

Reading the value of Ao(x) from Ref. [271 we conclude that the number of particles converted 
from Sm to P over time St, te <C St < tm in the adiabatic (MM) limit is described by the following 
CGF: 



Ss(Xi 5t) = <Sgcom(x <Sm, Sm)+ 
St 

7 



(k-i + k 2 + S M h) + \/ O-i + k 2 + Suh) 2 + ASukiktie* - 1) 



(32) 



Simulations with near-Gaussian distributions 



A probability distribution P{SQ) with known cumulants ci, C2,..., can be written as a limited 
Gram-Charlier expansion®! 



P(8Q)^^(SQ, Cl ,c 2 ) 



, , c 3 (y 3 -y) c 4 fa 4 -6y 2 + 3) 4(y e - 15y 4 + 45y 2 - 15) 
6cf 24c| + 724 



(33) 

where y = (SQ — ci)/y / C2 and ^f(SQ, c\, C2) is the Gaussian density with the mean c\ and the vari- 
ance C2- The leading term in the series is a standard Gaussian approximation, and the subsequent 
terms correctly account for skewness, kurtosis, etc. Note that if all cumulants scale similarly, as is 
true for our near-Gaussian case, then the terms in the series become progressively smaller, ensuring 
good approximations in practice. 

While the Gram-Charlier expansion provides a reasonable approximation to P, generation of 
random samples from such a non-Gaussian distribution is still a difficult task. However, if, instead 
of the random numbers per se, the goal is to calculate the expectation of some function f{SQ) over 
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the distribution P, (f(6Q))p, then the importance sampling technique^ can be used. Specifically, 
we generate a Gaussian random number 5Q according to ^(5Q, c\,C2) and define its importance 
factor according to its relative probability in the reference normal distribution and the desired 
Gram-Charlier approximation 

,= P (*Q> . (34) 

After generating N such random numbers SQ V , v = 1, . . . , N, we obtain the needed expectation 
values as 

/A EHi Vuf(SQ u ) 

\J/P = • ( 35 ) 

2Zu=x Vu 

If a current random number draw represents just one reaction in a larger reaction network, then 
the overall importance factor of a Monte Carlo realization is a product of the factors for each of 
the random numbers drawn within it. 

Note that the method reduces the complexity of simulations to that of a simple Gaussian, 
Langevin process with a small burden of (a) evaluating an algebraic expression for the Gram- 
Charlier expansion, and (b) keeping track of the importance factor for each of the Monte Carlo 
runs. Yet, at least in principle, this small computational investment allows to account for an 
arbitrary number of cumulants of the involved variables. To illustrate this, in Fig. [8j we compare 
the Gram-Charlier-based, importance-sampling corrected simulations of the MM reaction flux to 
the exact results in Results: Step 1. Introducing just the third and the fourth cumulant makes the 
simulations almost indistinguishable from the exact results. 

We end this section with a note of caution: the Gram-Charlier series produces approximations 
that are not necessarily positive and hence are not, strictly speaking, probability distributions. 
However, the leading Gaussian term decreases so fast that this may not matter in practice. Indeed, 
in our analysis, we simply rejected any random number that had a negative importance correction, 
and the agreement with the analytical results was still superb. 

However, this simplistic solution becomes inadequate for lengthy simulations, where the prob- 
ability that one of random numbers in a long chain of events falls into a badly approximated 
region of the distribution approaches one. Then the importance factor of the entire chain of events 
becomes incorrect, spoiling the convergence. In these situations, other approaches for generating 
random numbers should be used. A prominent candidate is the well-known acceptance-rejection 
method^. Since the true distributions we are interested in are near-Gaussian, a Gaussian with a 
slightly larger variance will be an envelope function for the Gram-Charlier approximation to the 
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FIG. 8: Comparison of an exact discrete distribution of product molecules generated by MM-enzymc 
(discrete points), with the fit by continuous approximation by leading terms of Gram-Charlier series. Left 
column compares the exact result to the Gaussian approximation with the same first two cumulants. Central 
column shows improvement of the fit due to inclusion of the third cumulant correction. Including the fourth 
cumulant (right column) makes the approximation and the exact result virtually indistinguishable. For these 
plots, we used 5m = 140 = const, k\ = 0.02, fc_i = 2., k 2 = 1., q = 0.01, and time step size St = 35 (see 
Introduction: The model and Results for explanation of the parameters). 

true distribution. Then the average random number acceptance probability will be similar to the 
ratio of the true and the envelope standard deviations, and it can be made arbitrary high. Then the 
rejection approach will require just a bit more than one normal and one uniform random number to 
generate a single sample from the underlying Gram-Charlier expansion. The orders-of-magnitude 
gain due to the transition to the coarse-grained description should fully compensate for this loss. 
Note that, in this case, the negativity of the series is not a problem since it will lead to an incorrect 
rejection of a single, highly improbable sample, rather than an entire sampling trajectory. 



25 



D. Coarse- graining all membrane reactions 

To complete the coarse-graining step that connects Figs. |3|b) andj^c), we look for the MGF 
of the total number of products Qp produced over time T ~ tb- 

oo 

Zixc) = e S(xc) = P(Qp\T)e iQpXc . (36) 
Q P =0 

For this, we discretize the time into intervals t k of durations St, and we introduce random variables 
SQ^itf.) (fx = 1, 2, 3), which represent the number of each of the three different reactions in Fig. [3|( 
(membrane binding, unbinding, and MM conversion) during each time interval. The probability 
distributions of SQ^{t k ) are given by inverse Fourier transforms of the corresponding MGFs: 

P{SQ^{t k )) = J dx M (tfc)e- i ^ ( ' fc),5Q '' ( * fc)+ ^ ( ^ (tfc) ' 5B( ' fc))5t , (37) 

where the CGF is 

S fl (x,S B ) = Hp(x,S B )6t. (38) 

Followin g ^ 28 * 29 * 3 1 the MGF of the total number of product molecules created during time 
interval (0,T) is given by the path integral over all possible trajectories of SQ^(t k ) and Su.(tk)'- 

<■-■' " "•<■■}■ - I LJ^\\{i,.)\ III/ UMj.Ah.) i'\>)(j n i!,)\e XcY:t k SQ3{tk) 



3 5( X c,T) = {e i XC Q P) = J D s M (t k )im J DSQ^h) P[8Q„(t k )} 



k M 

x S(S M (t k+1 ) - S M (t k ) - 5Qi(t k ) + SQ 2 (t k ) + SQ 3 (t k )). (39) 

Here we used the fact that Qp = Yl^Qsi^k)- 

k 

The 5-function in the path integral expresses the conservation law for the slowly changing 
number of substrate molecules Sm- We rewrite it as an inverse Fourier transform, 

S(S M (t k +i) - S M (t k ) - SQx(t h ) + SQ 2 (t k ) + SQ 3 (t k )) = 
1 



2tt 



dxu(t k ) exp {ixM{t k ) [S M {t k +i) - S M {t k ) - SQi(t k ) + SQ 2 {t k ) + SQ 3 (t k )}}, (40) 



and we substitute the expression together with Eq. (37) into Eq. (39). Then the integration over 
8Qu,(t) produces new (^-functions over Xu, which, in turn, are removed by integration over Xu(^fc)- 
This leads to an expression for the MGF: 

cT 



J DS M J DxMexp J dt[ixuSM + H(S u ,Xm,Xc)}, (41) 
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where 



H(S m ,Xm,Xc) =Hi(-XM,S M ,t) + H 2 (xM,S M ,t) + H 3 (xm + Xc, Sm, t) (42) 
=q S B (e- i ™ - 1) + S M q(e lXM - 1) + 



- + k 2 + S M h) + \j (fc-i + k 2 + S M h) 2 + 4S M &iA:2(e i X M+ xc - 1) 

(43) 

Notice that, unlike in the original work on the stochastic path integral, which assumed all com- 
ponent reactions to be Poisson, here H3 is the CGF of the entire complex MM reaction. This we 



read as the coefficient in front of 5t in Eq. (32 ), and it is clearly non- Poisson. This ability to include 
subsystems with small number of degrees of freedom, such as a single Michaelis-Menten enzyme 
or a stochastic gene expression, into coarse-graining mechanism based on the the stochastic path 
integral techniques opens doors to application of the method to a wide variety of coarse-graining 
problems. 

Since Sm S> 1 , this path integral is dominated by the classical solution of the equations motion 
(i. e., the saddle point), which, near the steady state, are 

Sm = 0, X m = 0, (44) 
9H dH 
oxm oS M 



Let Xcl(xc) and Sm,ci(xc) solve Eq (45). Then the cumulants generating function in the quasi- 
steady state approximation is 

S( XC ,T) = TH(S Mtd ( X c),Xci(xc),Xc) (46) 

This formally completes the last step of the coarse-graining by deriving the cumulant generating 
function for the number of complex Sb — * P transformation over long times. 

E. Simulations details 

All computer simulations were performed using Fortran 90 code, on a single processor AMD 
Barton 2500 (1.83 GHz), and operating system Windows 2000. 
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